Corpus-Based Identification of Non-Anaphoric Noun Phrases

نویسندگان

  • David L. Bean
  • Ellen Riloff
چکیده

Coreference resolution involves finding antecedents for anaphoric discourse entities, such as definite noun phrases. But many definite noun phrases are not anaphoric because their meaning can be understood from general world knowledge (e.g., "the White House" or "the news media"). We have developed a corpus-based algorithm for automatically identifying definite noun phrases that are non-anaphoric, which has the potential to improve the efficiency and accuracy of coreference resolution systems. Our algorithm generates lists of nonanaphoric noun phrases and noun phrase patterns from a training corpus and uses them to recognize non-anaphoric noun phrases in new texts. Using 1600 MUC-4 terrorism news articles as the training corpus, our approach achieved 78% recall and 87% precision at identifying such noun phrases in 50 test documents. 1 I n t r o d u c t i o n Most automated approaches to coreference resolution attempt to locate an antecedent for every potentially coreferent discourse entity (DE) in a text. The problem with this approach is that a large number of DE's may not have antecedents. While some discourse entities such as pronouns are almost always referential, definite descriptions I may not be. Earlier work found that nearly 50% of definite descriptions had no prior referents (Vieira and Poesio, 1997), and we found that number to be even higher, 63%, in our corpus. Some non-anaphoric definite descriptions can be identified by looking for syntactic clues like attached prepositional phrases or restrictive relative clauses. But other definite descriptions are non-anaphoric because readers understand their meaning due to common knowledge. For example, readers of this 1In this work, we define a definite description to be a noun phrase beginning with the. paper will probably understand the real world referents of "the F.B.I.," "the White House," and "the Golden Gate Bridge." These are instances of definite descriptions that a coreference resolver does not need to resolve because they each fully specify a cognitive representation of the entity in the reader's mind. One way to address this problem is to create a list of all non-anaphoric NPs that could be used as a filter prior to coreference resolution, but hand coding such a list is a daunting and intractable task. We propose a corpusbased mechanism to identify non-anaphoric NPs automatically. We will refer to non-anaphoric definite noun phrases as exis tent ial NPs (Allen, 1995). Our algorithm uses statistical methods to generate lists of existential noun phrases and noun phrase patterns from a training corpus. These lists are then used to recognize existential NPs in new texts. 2 P r i o r R e s e a r c h Computational coreference resolvers fall into two categories: systems that make no attempt to identify non-anaphoric discourse entities prior to coreference resolution, and those that apply a filter to discourse entities, identifying a subset of them that are anaphoric. Those that do not practice filtering include decision tree models (Aone and Bennett, 1996), (McCarthy and Lehnert, 1995) that consider all possible combinations of potential anaphora and referents. Exhaustively examining all possible combinations is expensive and, we believe, unnecessary. Of those systems that apply filtering prior to coreference resolution, the nature of the filtering varies. Some systems recognize when an anaphor and a candidate antecedent are incompatible. In SRI's probabilistic model (Kehler,

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus - Based Identi cation of Non - Anaphoric NounPhrasesDavid

Coreference resolution involves nding antecedents for anaphoric discourse entities, such as deenite noun phrases. But many deenite noun phrases are not anaphoric because their meaning can be understood from general world knowledge (e.g., \the White House" or \the news media"). We have developed a corpus-based algorithm for automatically identifying deenite noun phrases that are non-anaphoric, w...

متن کامل

Nominal Expressions in Multilingual Corpora: Definites and Demonstratives

This paper presents the results of a multilingual corpus study on definite descriptions and demonstrative noun phrases. The analysis made on a parallel corpus (French and Portuguese) reinforces previous findings regarding the predominance of non-anaphoric uses of definite descriptions in English corpus. It is also shown that the use of demonstrative noun phrases, on the other hand, is more regu...

متن کامل

Identifying Anaphoric and Non-Anaphoric Noun Phrases to Improve Coreference Resolution

We present a supervised learning approach to identification of anaphoric and non-anaphoric noun phrases and show how such information can be incorporated into a coreference resolution system. The resulting system outperforms the best MUC-6 and MUC-7 coreference resolution systems on the corresponding MUC coreference data sets, obtaining F-measures of 66.2 and 64.0, respectively.

متن کامل

Global Learning of Noun Phrase Anaphoricity in Coreference Resolution via Label Propagation

Knowledge of noun phrase anaphoricity might be profitably exploited in coreference resolution to bypass the resolution of non-anaphoric noun phrases. However, it is surprising to notice that recent attempts to incorporate automatically acquired anaphoricity information into coreference resolution have been somewhat disappointing. This paper employs a global learning method in determining the an...

متن کامل

A Study of Anaphoric Expressions in Human Produced Scientific Abstracts

One of the main reasons for having low quality automatic extracts is the presence of dangling anaphors. This paper analyses the referential expressions in a corpus of human written scientific summaries and tries to identify ways for improving the quality of automatic extracts. By recording the distance between the anaphoric expressions and their referents we noticed that humans do not use an ag...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999